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BOUND-CONSTRAINED POLYNOMIAL OPTIMIZATION USING ONLY 
ELEMENTARY CALCULATIONS 

ETIENNE DE KLERK, JEAN B. LASSERRE, MONIQUE LAURENT, AND ZHAO SUN 


Abstract. We provide a monotone non-increasing sequence of upper bounds (k > 1) con¬ 
verging to the global minimum of a polynomial / on simple sets like the unit hypercube in 
The novelty with respect to the converging sequence of upper bounds in [J.B. Lasserre, A new look 
at nonnegativity on closed sets and polynomial optimization, SIAM J. Optim. 21, pp. 864—885, 
2010] is that only elementary computations are required. For optimization over the hypercube 
[0, !]”■, we show that the new bounds have a rate of convergence in 0{l/y/k). Moreover we 
show a stronger convergence rate in 0(1/k) for quadratic polynomials and more generally for 
polynomials having a rational minimizer in the hypercube. In comparison, evaluation of all ratio¬ 
nal grid points with denominator k produces bounds with a rate of convergence in 0{l/k^), but 
at the cost of 0{k'^) function evaluations, while the new bound needs only 0{n^) elementary 
calculations. 


1. Introduction 

Consider the problem of computing the global minimum 
(1-1) /min,K = min{/(x) : xS/C}, 

of a polynomial / on a compact set 1C C M". (We will mainly deal with the case where /C is a basic 
semi-algebraic set.) 

A fruitful perspective, introduced by Lasserre m, is to reformulate problem (HID as 


/min./c = inf / fd^J,, 

Jk 


where the infimum is taken over all probability measures /r with support in /C. Using this re¬ 
formulation one may obtain a sequence of lower hounds on /min,ic that converges to /min.K, by 
introducing tractable convex relaxations of the set of probability measures with support in K, (if /C 
is semi-algebraic). For more details on this approach the interested reader is referred to Lasserre 
[m [H [18], and [201 [T7] for a comparison between linear programming (LP) and semidefinite 
programming (SDP) relaxations. 

As an alternative, one may obtain a sequence of upper bounds by optimizing over specific classes 
of probability distributions. In particular, Lasserre [19] defined the sequence (also called hierarchy) 
of upper bounds 


(1.2) fk°" ■= min / /(x)cr(x)dx : / cr(x)dx = 1 Y (fc = l,2,...), 

(tseUx] Uk: Jk J 

where Efc[x] denotes the cone of sums of squares (SOS) of polynomials of degree at most 2k. Thus 
the optimization is restricted to probability distributions where the probability density function is 
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an SOS polynomial of degree at most 2k. Lasserre [H] showed that —>■ /min.K as A: —>■ oo (see 
Theorem 12.11 below for a precise statement). In principle this approach works for any compact set 
/C and any polynomial but for practical implementation it requires knowledge of moments of the 
measure cr(x)dx. So in practice the approach is limited to simple sets /C like the Euclidean ball, 
the hypersphere, the simplex, the hypercube and/or their image by a linear transformation. 

In fact computing such upper bounds reduces to computing the smallest generalized eigenvalue 
associated with two real symmetric matrices whose size increases in the hierarchy. For more details 
the interested reader is referred to Lasserre m- In a recent paper, De Klerk et al. [B] have provided 
the first convergence analysis for this hierarchy and shown a bound — /min.M = 0{1/Vk) on the 
rate of convergence. In a related analysis of convergence Romero and Velasco [23] provide a bound 
on the rate at which one may approximate from outside the cone of nonnegative homogeneous 
polynomials (of fixed degree) by the hierarchy of spectrahedra defined in [19] . 

It should be emphasized that it is a difficult challenge in optimization to obtain a sequence 
of upper bounds converging to the global minimum and having a known estimate on the rate of 
convergence. So even if the convergence to the global minimum of the hierarchy of upper bounds 
obtained in [19] is rather slow, and even though it applies to the restricted context of “simple sets”, 
to the best of our knowledge it provides one of the first results of this kind. A notable earlier result 
was obtained for polynomial optimization over the simplex, where it has been shown that brute 
force grid search leads to a polynomial time approximation scheme for minimizing polynomials of 
fixed degree mm When minimizing over the set of grid points in the standard simplex with given 
denominator k, the rate of convergence is in 0(1/k) [Ull] and, for quadratic polynomials (and for 
general polynomials having a rational minimizer), in 0{l/k^) [5]. Grid search over the hypercube 
was also shown to have a rate of convergence in 0(1/k) [3] and, as we will indicate in this paper, a 
stronger rate of convergence in 0(l/fc^) can be shown. Note however that computing the best grid 
point in the hypercube [0,1]" with denominator k requires 0{k^) computations, thus exponential 
in the dimension. 


Contribution. As our main contribution we provide a monotone non-increasing converging se¬ 
quence (//^)fegrij of upper bounds > fmin,ic such that fj? —^ /min.ic as fc —)► oo. The parameters 
can be effectively computed when the set /C C [0,1]" is a “simple set” like, for example, a 
Euclidean ball, sphere, simplex, hypercube, or any linear transformation of them. 

This “hierarchy” of upper bounds is inspired from the one defined by Lasserre in [TB], but with 
the novelty that: 

Computing the upper bounds (f^) does not require solving an SDP or computing the smallest 
generalized eigenvalue of some pair of matrices (as is the case in US]/. It only requires elementary 
calculations (but possibly many of them for good quality bounds). 

Indeed, computing the upper bound f^ only requires finding the minimum in a list of 0(nf) scalars 
(7(7),/3 ))i formed from the moments 7 of the Lebesgue measure on the set K. C [0, 1]" and from the 
coefficients {fa) of the polynomial / to minimize. Namely: 


(1.3) 


:= min fc 






7(?j+a,/9) 


where N denotes the nonnegative integers, /(x) = ^ ’ l^ + Z^I = 

and the scalars 


7(»?,/3) 


r''! . . . ryVn 


(l_^^)/3i...(l_^„)/5.dx, (7y,/3)eN2", 
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are available in closed-form. (Our informal notion of “simple set” therefore means that the moments 
l(ri,0) are known a priori.) 

The upper bound (11.31) has also a simple interpretation as it reads: 


(1.4) 


fu = min 

(r;./3)GN2" 


f dK 

Jk 




mm I y fdfi:fj.G M{K.)k | , 


Jk 

where M[lC)k is the set of probability measures on /C, absolutely continuous with respect to the 
Lebesgue measure on /C, and whose density is a monomial x’'(l — x)^ with (r?,/3) G (Such 

measures are in fact products of (univariate) beta distributions, see Section [40]) This also proves 
that at any point a G [0,1]” one may approximate the Dirac measure (5a with measures of the form 
d/i = x’’(l — x)^ dx (normalized to make then probability measures). 

For the case of the hypercube /C = [0,1]”, we analyze the rate of convergence of the bounds 
and show a rate of convergence in 0{l/\/k) for general polynomials, and in 0{l/k) for quadratic 
polynomials (and general polynomials having a rational minimizer). As a second minor contribution, 
we revisit grid search over the rational points with given denominator k in the hypercube and 
observe that its convergence rate is in 0(l//c^) (which follows as an easy application of Taylor’s 
theorem). However as observed earlier the computation of the best grid point with denominator 
k requires 0{k^) function evaluations while the computation of the parameter requires only 
0{n^) elementary calculations. 


Organization of the paper. We start with some basic facts about the bounds in Section [2] 
and in Section [3] we show their convergence to the minimum of / over the set /C (see Theorem 13. II) . 

In Sectional for the case of the hypercube K. = [0,1]", we analyze the quality of the bounds 

. We show a convergence rate in 0{1/Vk) for the range — /min.K and a stronger convergence 
rate in 0{l/k) when the polynomial / admits a rational minimizer in [0,1]” (see Theorem 14.91) . 
This stronger convergence rate applies in particular to quadratic polynomials (since they have a 
rational minimizer) and Example 14. 101 shows that this bound is tight. When no rational minimizer 
exists the weaker rate follows using Diophantine approximations. So again the main message of this 
paper is that one may obtain non-trivial upper bounds with error guarantees (and converging to 
the global minimum) via elementary calculations and without invoking a sophisticated algorithm. 

In Section!^ we revisit the simple technique which consists of evaluating the polynomial / at all 
rational points in [0, Ij” with given denominator k. By a simple application of Taylor’s theorem 
we can show a convergence rate in 0(l/fc^). However, in terms of computational complexity, 
the parameters are easier to compute. Indeed, for fixed k, computing requires 0(n’^) 
computations (similar to function evaluations), while computing the minimum of / over all grid 
points with given denominator k requires an exponential number A:” of function evaluations. 

In Section |6] we present some additional (simple) techniques to provide a feasible point x G /C 
with value /(x) < /^, once the upper bound has been computed, hence also with an error 
bound guarantee in the case of the box /C = [0, Ij”. This includes, in the case when / is convex, 
getting a feasible point using Jensen inequality (Section 16.11) and, in the general case, taking the 
mode X of the optimal density function (i.e., its global maximizer) (see Section 16.21) . 

In Section [7l we present some numerical experiments, carried out on several test functions on 
the box [0,1]”. In particular, we compare the values of the new bound with the bound 
(whose dehnition uses a sum of squares density), and we apply the proposed techniques to find a 
feasible point in the box. As expected the sos based bound is tighter in most cases but the bound 
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can be computed for much larger values of k. Moreover, the feasible points x returned by the 
proposed mode heuristic are often of very good quality for sufficiently large k. Finally, in Section 
m we conclude with some remarks on variants of the bound fj^ that may offer better results in 
practice. 

2. Notation, definitions and preliminary results 

Throughout we let R[x] denote the ring of polynomials in the variables x = (xi,..., ai„), R[x]d 
is the subspace of polynomials of degree at most d, and S[x](i C R[x] 2 c; is its subset of sums of 
squares (SOS) of degree at most 2d. 

We use the convention that N denotes the set of nonnegative integers, and set N)) := {a G N" : 
Sr=i (—'■ l®l) = similarly := {a £ N" : The notation x“ stands for 

the monomial while (1 — x)“ stands for (1 — • • • (1 — a G N". We will also 

denote [n] = {1, 2,..., n} and let 1 denote the all-ones vector (of suitable size). 

One may write every polynomial / G R[x](; in the monomial basis 

/(x) = ^ /aX“, 

with vector of (finitely many) coefficients {fa)- 

2.1. The bounds and fjf. In [H], Lasserre introduced the parameters J™'* as upper bounds 
for the minimum /min.A: of / over /C and he proved the following result. 

Theorem 2.1 (Lasserre [H]). Let 1C C R." be eompaet, let /min.K: be as in and let 

(2.1) Z™'* :=inf|y f{x.) a{x.) dx : J tT(x) dx = 1, ct G S[x]fc | , fc G N. 

Then /min.K < /|+i < /I"® for all k and 

(2.2) Un,K = lim fr- 

K—^OO 

We will also use the following important result due to Krivine nmi] and Handelman m- 

Theorem 2.2. Let /C = {x ; gj{x) > 0, j = 1, ■ • ■, rn} C R" be a polytope with a nonempty interior 
and where each pj is an affine polynomial, j = 1,..., m. // / G R[x] is strictly positive on K. then 

(2.3) /(x) = ^ Aa5i(x)“i •••5,.„(x)“'", VxGR”, 
for finitely many positive scalars Xa ■ 

We will call the expression in (12.31) the Handelman representation of /, and call any / that allows 
a Handelman representation to be of the Handelman type. Throughout we consider the following 
set of polynomials: 

(2.4) 'Hfc := < p G R[x] : p(x) = ^ A,,,/ 3 x’'(l — x)^ where A ,,/3 > 0 

i (r,,/3)GN|" 

i.e., all polynomials that admit a Handelman representation of degree at most k in terms of the 
polynomials Xi, 1 — Xi defining the hypercube [0,1]”. 

Observe that any term x’'(l — x)^ with degree |p + /3| < k also belongs to the set TLk- This 
follows by iteratively applying the identity: 1 = xi + {1 — Xi), which permits to rewrite x’'(l — x)^ 
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as a conic combination of terms x’’ (1 — x)^ with degree \r]' + /3'| = k. The next claim follows then 
as a direct application. 

Lemma 2.3. We have the inclusion: Jik Q 'Hk+i for all k. 

We may now interpret the new upper bounds from m in an analogous way as the bounds 
jsos fj-Qjjj (12,11) . but where the SOS density function a G Efc[x] is now replaced by a density a GHk- 
For clarity we first repeat the definition m of the parameters below: 


fk ^ min^^ f. 


(?7,/3)GN; 


k cGN" 


l(r}+a,P) 


where the scalars 


[ x''(l-x)^dx= [ {I-xi)^^ ■■■{!-Xn)^’'dxi---dx„, (7?,/3)gN^”, 

d/C d/C 

denote the moments of the Lebesgue measure on the set 1C. Using the fact that 

Y /a7(r)+a,/3) = ^ fc^j x''+“ (1 - x)rfx = / /(x)x’'(1 - x)'^dx, 


aGN"- aGN” 

we can rewrite the parameter fjf as in (El: 


fu = min 


f /(x)x''(l-x)^dx 
JK 


’(1 - x)'^ dx 


We now give yet another reformulation for the parameter f^, where we optimize over density 
functions in the set T-Lk, which turn out to be convex combinations of density functions of the form 
x''(l — x)^ (after suitable scaling). 

Lemma 2.4. Let 1C C [0,1]", let f be a polynomial, and consider the parameters f^, k G N, from 
ESD- Then one has: 

fk = inf I [ /(x) (t(x) dx : [ (T(x)dx = ll forallkcN, 

(d/c JK J 

and the sequence {fk )k Is monotonically non-increasing: fk+i ^ fk ■ 

Proof. Note that, for given k G N, 

inf I J /(x) a(x) dx : J a(x) dx = 1, cr G "Hfc | 

t " \ 1 




= inf 
A>0 




Y Kh / x''+“(l - x)^dx 

(r/./3)GN|" CJS. -„-' 


7(^ + c,/3) 


'Y. / x''(l — x)^ dx = 1 

(r;,/3)GN|" I5. -„-^ 


'y(v,p) 


'Y/ I foLl(ri+a, 


Jr/,/3)GN|’* 


= mm 


(r?,/3)GN2^ 


E 


7(i?,/3) 


P) I ■ E 7(?).^) — 1 


= 

Jk 1 
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where we have used the fact that the penultimate optimization problem is an LP over a simplex 
that attains its infimum at one of the vertices. The monotonicity of the sequence (/^)feeN now 
follows from Lemma [2.31 □ 


2.2. Calculating moments on /C. For /C C [0,1]" a compact set and for every (r/,/?) £ we 
need to calculate the parameters 

(2.5) := [ x’'(l-x)^dx, 

JK 

in order to compute . When /C is arbitrary one does not know how to compute such generalized 
moments. But if /C is the unit hypercube [0,1]", the simplex A := {x : x > 0; < 1}, a 

Euclidean ball (or sphere), and/or their image by a linear mapping, then such moments are available 
in closed-form; see e.g. m- We give the moments for the hypercube 1C = [0,1]", which we will 
treat in detail in this paper. Namely, 

x/‘(l - Xi)^' dx^ , for any (ry,/?) G 
and the univariate integrals may be calculated from 

(2.6) [ t\l - ty dt = f. foranyi,/GN. 

Jo (t+j + iy. 



2.3. The complexity of computing and We let Nf denote the set of indices a G N" 
for which fa 0; note that \Nf\ < if d is the total degree of /. The computation of is 

done by computing the summations: 


H A 

olGN f 


7(r)+a,/3) 

7(i;,/3) 


for all (?7,/3) G N|", and taking the minimum one. (We assume that the values are pre¬ 

computed for all (?7,/3) G Nfe"d-) 

Thus, for fixed {r],f3) G N|", one may first compute the inner product of the vectors with 
components fa and 'y[rj+a,p) (indexed by a). Note that these vectors are of size |1V/|. Since there 
are pairs (ly,/?) G the entire computation requires (2|iV/| -|- flopt0- 

As explained in [H], the computation of the upper bounds may be done by finding the 
smallest generalized eigenvalue A of the system: 

Ax = XBx {x ^ 0), 

for suitable symmetric matrices A and B of order ("'^^). In particular, the rows and columns of 
the two matrices are indexed by N"^, and 

Aa,/3= ^ fs [ x“+'^+'^dx, Ba,i3= [ x“+^dx a,/3GN<fe. 

seNf Jic 


Note that the matrices A and B depend on the moments of the Lebesgue measure on /C, and that 
these moments may be computed beforehand, by assumption. One may compute by taking 


^We define floating point operations (flops) as in [9] p. 18]; in particular, by this definition the inner product of 
two n-vectors requires 2n flops. 
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the inner product of {fs)sGNf with the vector of moments (/^ . Thus computation 

of the elements of A require a total of |A^/| flops. 

Also note that the matrix B is a, positive definite (Gram) matrix. Thus one has to solve a so- 
called symmetric-definite generalized eigenvalue problem, and this may be done in 14flops; 

see e.g. O Section 8.7.2]. Thus one may compute /|°® in at most 14("^^) -|- |7V/| f (”^^) -I- Ij 
flops. 

2.4. An illustrating example. We give an example to illustrate the behaviour of the bounds /|°® 
and f^. More examples will be given in Section [T] 

Example 2.5. As an example we consider the bivariate Styblinski-Tang function 

2 

f{xi,X2) = ^ - 5)"^ - 8(10xi - 5)^ -I- - 5) 

over the square K = [0,1]^, with minimum fmin,K ~ —78.33198 and minimizer 

X* Ri (0.20906466,0.20906466). 

Using a SOS density function, the upper bound of degree 2 is = —12.9249, and the corresponding 
optimal SOS density of degree 2 is (roughly) 

(t{xi,X2) = (1.9169- l.OOSsi - 1.005a;2)^- 

Using a Handelman-type density function, the upper bound of degree 2 is ff^ = —17.3810, with 
corresponding optimal density 

a{xi,X 2 ) = 6 a; 2 (l - 3^2)- 

On the other hand, if we consider densities of degree 6 then we get /|°® = —34.403 and f^ = 
-31.429. 

Thus there is no general ordering between the bounds ff°^ and /^. Having said that, we will 
show in Section^ that, for most of the examples we have considered, one has ff°^ < for all k, 
as one may expect from the relative computational efforts. As a final illustration. Figure [I| shows 
the plot and contour plot of the Handelman-type density corresponding to the bound = —60.536 
(i.e. degree 50). 

The figure illustrates the earlier assertion that the optimal density approximates the Dirac delta 
measure at the minimizer yA « (0.20906466,0.20906466). Indeed, it is clear from the contour plot 
that the mode of the optimal density is close to x*. 


3. Convergence proof for the bounds on /C c [0,1]” 

In this section we prove the convergence of the sequence (/^)feGN to the minimum of / over any 
compact set K C [0,1]". 

Theorem 3.1. Let K C [0,1]", let f S K[x]d and let be as in i2.5]) . Define as in UTS]) the 

parameters 

(3.1) /f = min ^ VfceN. 

Then, fmin.ic = Hm fk- 

k—^oo 


T{v,P) 
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Figure 1. Optimal Handelman-type density cr of degree 50 on [0,1]^ for the bi¬ 
variate Styblinski-Tang function. 


Proof. As in (11.21) . let denote the bound obtained by searching over an SOS density a of degree 
at most 2k: 

= min / /(x)(T(x)(ix such that / cr(x)dx = 1, a G S[x]fc. 

JK JK 

Also recall from Lemma that 

= min / /(x)cr(x)(ix such that / (t(x)c?x = 1, cr g "Hj.. 

JK JK 

By Lemma 1^41 the sequence (/^) is monotone non-increasing, with /min.K < fk k. Hence 

it has a limit which is at least fmin.K^ we now show that the limit is equal to /min.ic- 

To this end, let e > 0. As the sequence {fk°^) converges to /min. a: (Theorem [2T]), there exists an 
integer k such that 

/min./C < fk°^ < /min./C + £• 

Next, there exists a polynomial a € Tik such that tT(x)(ix = 1 and 


/r < / /(x)a(x)dx < /r + G 

JK 


Define now the polynomial ct(x) = ct(x) -|- e. Then a is strictly positive on [0,1]” and thus, by 
Theorem 12.21 applied to the hypercube [0,1]", a G for some integer j/. Observe that 


/ (T(x)dx = / (cr(x) -I- e)dx > / cr(x)dx = 1. 
JK JK JK 


Hence we obtain: 

/k — /min,/C < 


H r , /iC_ /^(/(x) - /min,/c)0-(x)dx 


(5’(x)ds 


/min./C — 


d'(x)dy 


< / (/(x) - /min,K:)o'(x)dx. 


The right most term is equal to 

[ (/(x)-/min,/c)o'(x)dx-he [ (/(x)-/min,/c)dx = [ /(x)cr(x)dx-/min./C+6 [ (/(x) -/min,/c)dx, 
JK JK JK JK 
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where we used the fact that a(x)dx = 1. Finally, combining with the fact that f{x)a{x)dx < 
fr + e < /min, jc + 2e, we can derive that 

ffk ~ /min./C < e (^2 - fn,inx)d^ = 

where C := 2 + /^^-(/(x) — /min,K:)dx is a constant. This concludes the proof. □ 

Note that, in the proof, it was essential to have a strictly positive on all of [0,1]", for the 
application of Handelman’s theorem. The fact that d-(x) = cr(x) + e with a SOS and e > 0 
guaranteed this strict positivity. 

4. Bounding the rate of convergence for the parameters on /C = [0,1]” 

In this section we analyze the rate of convergence of the bounds for the hypercube /C = [0,1]". 
We prove a convergence rate in 0{1/Vk) for the range — /min.K in general, and a stronger 
convergence rate in 0(1/k) when / has a rational global minimizer in [0,1]”, which is the case, for 
instance, when / is quadratic. 

Our main tool will be exploiting some properties of the moments 7 (r),/ 3 ) which, as we recall below, 
arise from the moments of the beta distribution. 


4.1. Properties of the beta distribution. By definition, a random variable X G [0,1] has the 
beta distribution with shape parameters a > 0 and 6 > 0, which is denoted by X ^ beta(a, b), if its 
probability density function is given by 

If a > 1 and b > 1, then the (unique) mode of the distribution (i.e., the maximizer of the density 
function) is 

(4.1) y={a-l)/{a + b-2). 

Moreover, the fc-th moment of X is given by 

a(a + 1) • • • (a + fc — 1) 


(4.2) 


E(X'=) = 


(a + b)(a + b + 1) ■ ■ ■ {a + b + k — 1)^ 
(see, e.g, [TH Chapter 24]; this also follows using (I2.6p l. 


(fc = 1,2,3,...) 


In what follows we will consider families of random random variables with the beta distribution 
of the form X ~ &eta(ar, 6r), where a and b are positive real numbers and r > 1 is an integer. By 
(Oil . any such random variable has mean 


E{X) = 


ar 


ar + br a + b 

In Lemma 14.21 below we show how the moments of such random variables relate to powers of the 
mean. The proof relies on the following technical lemma. 


Lemma 4.1. Let k be a positive integer. There exists a constant Ck > 0 (depending only on k) for 
which the following relation holds: 

rp{rp + 1) ■ ■ ■ (rp + fc - 1) _ ^ ^ ^ 
rq(rq + 1) • • • (rg + /c — 1) q^ ~ r 
for all integers r > 1, and real numbers 0 < p < q. 
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Proof. Consider the univariate polynomial (j){t) = (t + 1) •••(< + A: — 1) = where the 

scalars > 0 depend only on k and Ok-i = 1- Denote by A the left hand side in (14.31) . which can 
be written a.s A = N/D, where we set 

N := rpq'^(j){rp) — rqp^(j){rq), D := rq^^^(j){rq). 


k-2 


i i k—1 

air q p 


=rpq'^ 




i=0 


We first work out the term N: 

/k-2 k-2 

N = rpqiYl 

- qk-l-i __ 

fact that p < q. This implies: 

k-2 

N < rpq{q — p) ^ air'^p^q^~‘^{k — 1 — z) = rpq^~^{q — p) ^ ai(k — 1 — i)r'^p^ =: . 


Write: — pk-i-i _ q 3 pk- 2 -i-j < (g _ p)q^-‘^-'^{k — 1 — i), where we use the 


k-2 


i =0 




Thus we get: 


ELo a^ik-l- iyp^ 


D q^ 4’{rq) 

The first factor is at most 1, since one has: p{q — p) < q^, a.s q^ —p{q — p) = {q — p)^ +pq- Second, 
we bound the sum Ei=o — 1 — i)Pp'^ in terms of (t>{rq) = Ej=o ajT-lg-l. Namely, define the 
constant 

ai{k — 1 — i) 


Ck '■= max 


Q<i<k—2 

Ck 


which depends only on k. We show that 

adk— 1 — < 

r 

Indeed, for each 0 < i < k — 2, using p® < 9 ®+^ and the definition of Ck, we get: 

r • adk - 1 - f)r®p® < adk - 1 - i)r®+E'+^ < Cka^+id+^q^+\ 
Summing over i = 0, l,...,fc — 2 gives: 


k-2 


k-2 


■ ^ ai(fc - 1 - i)r®p® <CkY < Ckddq), 


i=0 


i=o 


and thus 


as desired. 


A y Ck 
A < — < — 
D r 


□ 


Lemma 4.2. For any integer k > 1, there exists a constant Cj. > 0 (depending only on k) for 
which the following holds: 

|E(A'=) - (E(A))'®| < 

I I j, 

for all integers r > 1, real numbers a,b > 0, and where X ^ beta{ar, br). 

Proof. Directly using (14.2L E(A) = and Lemma |4T] applied to p = a and q = a + b. □ 
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Now we consider i.i.d. random variables Xi ,..., Xn such that 
(4.4) Xi ^ beta{air,bir) ai,bi > 0 (i G [n]), r > l,r G N, 

and denote X = {Xi,... ,X„). For given a G N", we denote = 0"=! ■ Since the random 

variables Xi are independent we have E(X“) = and, for a polynomial / = 

the expected value of f{X) = is given by 


(4.5) E(/(X)) = ^ /„E(X“) = ^ Ul[Eixn- 

Recall that the explicit value of E(X“*) is given by (14.21) . The next result relates E(/(X)) (the 
expected value of f{X j) and /(E(X)) (the value of / evaluated at the mean of X). 

Lemma 4.3. Let /(x) = ^ ~ ^ ^n), where the i.i.d. random variables 

Xi (i G [n]) are as in (14.41) . Then there is a constant Cf > 0 (depending on f only) such that 

|E(/(X))-/(E(X))|<^. 

r 

Proof. We have 


2 = 1 


E(/(X)) -/(E(X)) = ^ m[E{xr) -Y[mx,)) 

By the identity: 

n n n 

(4.6) 
one has 


2=1 2=1 2 = 1 


^2 = 1 


2 — 1 n 


{xi - y^) n yj n 


j=l j=i+l 


(x,y gK"), 


nE(v“^) - =E - (e(x*))“o n 


2=1 


2 = 1 


2=1 




j=2+l 


Since E(X 2 ) G [0,1] and E(X"") G [0,1] for any i G [n], it follows that 

n 

|E(/(X)) -/(E(X))| < ^ |/«|^|E(Xr)-(E(X,))“^| 

aCN’^ 2=1 

n (-i! 

^ Ei/«iE^> 

2=1 


where the second inequality is from Lemma 14.21 and the constant > 0 only depends on . 
Setting C/ := X^aGN" l/^l YTt=\ ^'oa concludes the proof. □ 
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4.2. Proof of the convergence rate. Let x* be a global minimizer of / in [0,1]”. Our objective 
is to analyze the rate of convergence of the sequence {f^ — /(x*))^. Our strategy is to define 
suitable shape parameters from the components x* of the global minimizer x* so that, if we 

choose a vector X = {Xi,..., Xn) of i.i.d. random variables with Xi ^ beta{r]*, f]*), then (roughly) 
E(X) Ri X* and E{f{X)) Ri (so that we can use the result of Lemmato estimate — /(x*). 

In a first step we indicate how to define the shape parameters ly*, /?*. For any given integer r > 1 
we will select them of the form rj* = rai , /3* = rbi , where ai , bi are constructed from the coordinates 
of X* . As we want rj* , [3* to be integer valued we need to discuss whether a coordinate Xi is rational 
or not, and to deal with irrational coordinates we will use the following result about Diophantine 
approximations. 


Theorem 4.4 (Dirichlet’s theorem). (See e.g. [24l Chapter 6.1]^ Consider a real number x G M 
and 0 < e < 1. Then there exist integers p and q satisfying 


P 

X - 

q 


e 1 

< - and 1 < q < —■ 

q e 


If X € (0,1), then one may moreover assume 0 < p < g. 


Definition 4.5 (Shape parameters for rational components). Fix an integer r > 1. For rational 
coordinates x* € Q define rj*, (3* as follows: 

(i) If X* =0 then set rj* = 1 and (3* = r. 

(ii) If X* = 1 then set rj* = r and /3* = 1. 

(iii) If X* G Q \ {0,1} then write x* = Pi/qi where 1 < Pi < qi are integers, and set rj* = rpi 

and (3* = r{qi -pi). 


Definition 4.6 (Shape parameters for irrational components). Fix an integer r > 1. For each 
irrational coordinate x* G R\Q, apply Theorem \4-4\ with e = l/r to obtain integers pi,qi satisfying 

Pi 1 

X*- - < —, 0 < Pi < qi < r, and 1 < qi. 

qi rqi 

Define the sets Jq = {f G [n] : x* G R \ Q, Pi = 0}, /i = {i G [n] : x* G R \ Q, Pi = qi}, and 

/ = {i G [n] : X* G R \ Q, I < Pi < qi}, and define rj*, (3* as follows: 

(iv) If i € Iq then set rj* = 1 and (3* = r. 

(v) If i G Ii then set rj* = r and (3* = 1. 

(vi) If i G I then set rj* = rpi and (3* = r{qi — pi). 


As above consider i.i.d. X = (Xi,... ,X„), where Xi ^ beta{ri*, j3*). Then, by construction, for 
all i G [n], one has 




1 

r+1 

in cases (i), (iv), 

r 

r+1 

in cases (ii), (v). 

Pi 

Qi 

in cases (iii), (vi 


One can verify that in all cases one has 
(4.7) |E(Ai) — x*| < l/r for all i G [n]. 

Observe morever that, again by construction, 

L /(x)x’'*-i(l - xf‘-^dx 

^ F > /(X-) 


(4,8) 
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where we let 1 denote the all-ones vector and we define the parameter 

n 

(4.9) + 

i=l 

We will use the following estimate on the parameter kr- 

Lemma 4.7. Consider the parameter kr = J = {i € [n] : x* € Q\{0,1}}. 

Then the following holds: 

(a) If'X* G Q" then kr < ar for all r > 1, where a > 0 is a constant (not depending on r). 

(b) If X* ^ Q” then kr < a'r'^ for all r > 1, where a' > 0 is a constant (not depending on r). 

(c) For r = 1, we have that ki = — 2|J|. 

Proof. By construction, p* + (3* — 2 = rqi — 2 for each i G / U J, and p*+(3* — 2 = r — l otherwise. 

From this one gets kr = Qi+n—\IUJ\) — n—\I\Jj\ =: ar — b, after setting b := n+\IUJ\ 

and a := Qi + n — \I U J\, so that a,b > 0. Thus, kr < ar holds. 

Next, note that qi < r for each i G J, while qi does not depend on r for i G J (since then 
X* = Pi/qi). Hence, in case (a), 7 = 0 and the constant a does not depend on r. In case (b), we 
obtain: a < r\I\ + qi + n — \I U J\ < a'r, after setting a' := |/| + qi + n — \I \J J|, which 
is thus a constant not depending on r. Then, kr < ar < a'r'^. 

In the case r = I, the set I is empty and thus ki — J2ieJ 9 * ~ 2| J|, showing (c). □ 


We can now prove the following upper bound for the range E{f{X)) — /(x*) (thus also for the range 
fk^r ~ /(^*)) '^hich will be crucial for establishing the rate of convergence of the parameters fj?. 

Theorem 4.8. Given a polynomial f of total degree d, consider a global minimizer x* of f in 
[0,1]". Let r be a positive integer. For any x* G [0,1] (i G [n]), consider the parameters rj*, ft* 
as in Definitions \4-.5\ and \4.6\ and i.i.d. random variables Xi ^ beta{p*, (3*). Then there exists a 
constant (7/ > 0 (depending only on f) such that 

/,^-/(x*)<E(/(X))-/(x*)<^, 

r 

where kr is as in ra- 

Proof. The leftmost inequality follows using (14.8p . we show the rightmost one. By Lemma [4.31 one 
has: 


E(/(X))-/(x*) = E(/(X))-/(E(X))+/(E(X))-/(x*) 

< Cf/r + f(E{X))-f{x*), 

where Cj > 0 is a constant that depends on / only. Thus we need only bound f{'E{X)) — /(x*). 
To this end, note that 


/(E(X))-/(x*) 


Using again the identity (14.61) one has 


^ fjf[EiX,) 



n 



< ^ |E(W)-<|<^, 
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where d is the degree of /, and we have used |E(Xi) — x*\ < 1/r, x* G [0,1] and E(Xi) G [0,1] for 
all z G [n]. Setting 

Cf = Cf + dY, \u 

aGN" 

completes the proof. □ 


Finally we can now show the following for the rate of convergence of the sequence /^, which is 
our main result. 


Theorem 4.9. Let f be a polynomial, let x* he a global minimizer of f in [0,1]”, and consider as 
before the parameters 


H ^ . /[o.i]"/W xni-x)^dx 

-x)/5dx 

There exists a constant Mf(depending only on f) such that 


(fc = l,2,...). 


(4.10) 


fk 


/(x*) < 


Ml 

y/k 


for all k> ki, 


where ki = ~ ‘AJ\i J = {z G [n] : x* G Q \ {0,1}} and x* = Pi/qi for integers 

1 < Pi < Qi if i G J . Moreover, if f has at least one rational global minimizer x*, then there exists 
a constant M} (depending only on f) such that 


(4.11) 


rH 


M'f 

/(x*) < for all k > ki. 

fx 


In particular, the convergence rate is in 0{l/k) when f is a quadratic polynomial. 


Proof. Consider an arbitrary integer k > ki. Let r > 1 be the largest integer for which k > kr. Then 
we have kr < k < kr+i. As kr < k, we have the inequality f^ < f^ and thus, by Theorem 14.81 we 
obtain 

/f-/(x*)</,^-/(x*)<^, 
where the constant C/ depends only on /. 

If X* G Q" then, by Lemma H771 fab kr+i < a(r + 1) < 2ar. This implies k < kr+i < 2ar, where 
the constant a does not depend on r. Thus, 

where the constant Mf = 2aCf depends only on /. This shows (14.111) . 

If X* ^ Q" then, by Lemma [4.71 (b), fc^+i < «^(^ + 1)^ < 4aV^. This implies k < fc^+i < 4a'r^ 
and thus ^ where the constant a' does not depend on r. Therefore, 


fk 


/(x*) < ^ < 
r 


2y/fdCf 

y/k 


M'f 

TT’ 


where the constant M} = 2y/a!Cf depends only on /. This shows (14.101) . 

Finally, if / is quadratic then, by a result of Vavasis [25], / has a rational minimizer over the 
hypercube and thus the rate of convergence is 0{l/k). □ 
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Note that the inequalities (14.101) and (14.111) hold for all k > ki, where ki depends only on the 
rational components in (0,1) of the minimizer x*. The constant ki can be in 0(1), e.g., when all 
but 0(1) of these rational components have a small denominator (say, equal to 2). Thus we can, 
for some problem classes, get a bound with an error estimate in polynomial time. 


Example 4.10. Consider the polynomial f = X]r=i ^ = Pj !]"■ Then fminX = 0 is 

attained at x* =0. Using the relations (1^ . (1^ and eiD, it follows that 


fk 


min 7 

(r,,/3)GN|" 


+ 1 

Vi T Pi + 2 


Since rji + Pi < k and rji > 0 (for any i G [n]), we have f(f > 

By this example, there does not exist any S > 0 such that, for any f, f^ — /min.A; = 0(1/A:^''''^). 
Therefore, when a rational minimizer exists, the convergence rate from Theorem \4.9\ in 0(1/k) for 
f^ is tight. 


5. Bounding the rate of convergence for grid search over K . = [0,1]” 

As an alternative to computing f^ on K. = Q := [0,1]", one may minimize / over the regular 
grid: 

Q{k) := {xGQ= [0,1]" I fcxeN"}, 

i.e., the set of rational points in [0,1]" with denominator k. Thus we get the upper bound 

/min,Q(fc) ■— ITill /(^) — /min,Q k = 1,2,... 
xGQ(A;) 

De Klerk and Laurent [3] showed a rate of convergence in 0(1/k) for this sequence of upper bounds: 


(5.1) 


f ^ Hf) ( 

J min,Q(k) jmin,Q ^ I 


d + 1 

V 3 


for any k > d, 


where d is the degree of / and L{f) is the constant 

L{f) = max|/o 




a ! 


We can in fact show a stronger convergence rate in 0{l/k^). 


Theorem 5.1. Let f be a polynomial and let x* be a global minimizer of f in [0,1]". Then there 
exists a constant Cf (depending on f) such that 

fmin.Q(k) - /(x*) < ^ for all k>l. 

Proof. Fix fc > 1. By looking at the grid point in Q{k) closest to x*, there exists h G [0,1]" such 
that X* + h G Q{k) and ||h|| < ■^. Then, by Taylor’s theorem, we have that 

(5.2) fix* + h) = fix*) + h^V/(x*) -f ih^VV(C)h, 

for some point ( lying in the segment [x*, x* + h]c [ 0 , 1 ]". 

Assume first that the global minimizer x* lies in the interior of [0,1]". Then V/(x*) = 0 and 
thus 

nC 

/min,QW - /(X*) < fix* + h) - /(x*) < CJlhU^ < —, 
after setting C := max^g[o_i]n j]V^/(C)ll/2. 
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Assume now that x* lies on the boundary of [0,1]" and let /q (resp., I\, I) denote the set 
of indices i G [n] for which a;* = 0 (resp., x* = 1, x* G (0,1)). Define the polynomial g{y) = 
f{y, 0,..., 0,1,..., 1) (with 0 at the positions i G Iq and 1 at the positions i G Ii) in the variable 
y G Then xj = {x*)i^i is a global minimizer of g over [0, l]l^l which lies in the interior. So we 
may apply the preceding reasoning to the polynomial g and conclude that gmin,Q{k) ~ ^ 

for some constant C" (depending on g and thus on /). As /min,Q(fe) < 9min,Q(k) and /(x*) = g{xj) 
the result follows. □ 

Therefore the bounds /min.Q(is) obtained through grid search have a faster convergence rate than 
the bounds f^. However, for any fixed value of k, for the bound one needs a polynomial number 
0{n^) of computations (similar to function evaluations), while computing the bound /min,Q(fc) 
requires an exponential number fc" of function evaluations. Hence the ‘measure-based’ guided 
search producing the bounds is superior to the brute force grid search technique in terms of 
complexity. 


6. Obtaining feasible points x with /(x) < 

In this section we describe how to generate a point x G /C C [0, Ij" such that /(x) < (or such 
that /(x) < + € for some small e > 0). 

We will discuss in turn: 

• the convex case (and related cases), and 

• the general case. 

6.1. The convex case (and related cases): using the Jensen inequality. Our main tool for 
treating the convex case (and related cases) will be the Jensen inequality. 

Lemma 6.1 (Jensen inequality). IfC C R" is convex, </>: C —>■ R is a convex function, and A G C 
a random variable, then 

0(E(A)) < E(</)(A)). 

Theorem 6.2. Assume that K. C [0, Ij" is closed and convex, and {r],l3) G 

rH ^ //c/W x’?(l-x)^dx 

Let X = {Xi ,..., Xn) be a vector of random variables with Xi ~ beta{r]i + 

Then one has /(E(A)) < f^ in the following cases: 

(1) f is convex; 

(2) / has only nonnegative coefficients; 

(3) / is square-free, i.e., /(x) = X)a6{o,i}» /a^“- 

Proof. The proof uses the fact that, by construction, 

/f =E(/(A)). 

Thus the first item follows immediately from Jensen’s inequality. For the proof of the second item, 
recall that 

n 

/f =E(/(A))= ^ Ul[E{xr) 

agN" i=l 


is such that 

1) Pi + 1) ("i G [n\). 



BOUND-CONSTRAINED POLYNOMIAL OPTIMIZATION USING ONLY ELEMENTARY CALCULATIONS 17 


where we now assume /a > 0 for all a. Since (j){Xi) = X^' is convex on [0,1] {i S [n]), Jensen’s 
inequality yields E(X“’) > [E(Xi)]“Y Thus 

/f > E 

aGN" 

as required. For the third item, where / is assumed square-free, one has 

n 

/f =E(/(X))= ^ Ul[E{Xn 
where all a G {0,1}" so that E(X“*) = [E(Xi)]“‘, and consequently 

/f = ^ umr- 

qGN" 

This completes the proof. □ 

6.2. The general case. 

Sampling. One may generate random samples x G /C from the density a on X using the well-known 
method of conditional distributions (see e.g., [21] Section 8.5.1]). For X = [0,1]", the procedure is 
described in detail in [6l Section 3]. In this way one may obtain, with high probability, a point 
X G X with /(x) < + e, for any given e > 0. (The size of the sample depends on e.) Here 

we only mention that this procedure may be done in time polynomial in n and 1/e; for details the 
reader is referred to jS] Section 3]. 

A heuristic based on the mode. As an alternative, one may consider the heuristic that returns the 
mode (i.e., maximizer) of the density function cr as a candidate solution. By way of illustration, 
recall that in Example 12.51 the mode was a good approximation of the global minimizer for a of 
degree 50; see Figure [TJ The mode may be calculated one variable at a time using (14.11) . 

In SectionjTjbelow, we will illustrate the performance of all the strategies described in this section 
on numerical examples. 


7. Numerical examples 

In this section we will present numerical examples to illustrate the behavior of the sequences of 
upper bounds, and of the techniques to obtain feasible points. 

We consider several well-known polynomial test functions from global optimization (also used in 
i), that are listed in Table lU where we set 

/maxx := max{/(x) : xGX}. 

Note that the Booth and Matyas functions are convex. Note also that the functions have a 
rational minimizer in the hypercube (except the Styblinski-Tang function). 

fU _ f 

We start by listing the relative gaps RG{%) = ^ x 100 for these test functions in 

Table ID for densities with degree up to fc = 50. 
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Table 1. Test functions 


Name 

Formula 

Minimum (/mln.K) 

Maximum (/max.K) 

Search do¬ 
main (AC) 

Booth Func¬ 
tion 

/ = {20x1 + A0 x 2 - ilY + 
(40x1 + 20x2 - 35)^ 

/(0.55, 0.65) - 0 

/{0,0) - 2594 

[0,1]2 

Matyas 

Function 

/ = 0.26[(20xi - 10)^ + 

(20x2 - 10)^] - 0.48(20x1 - 
10)(20 x2 - 10) 

/(0.5,0.5) ^ 0 

/(O,1) = 100 

[0,1]2 

Motzkin 

Polynomial 

~ = (4x1 - 2)*(4 x2 - 

2)2 +(4x1 -2)2(4 x 2 -2)'* - 
3(4x1 - 2)2(4 x2 -2)2 + 1 

/(ii) = /(if) = 
/(fT) = /(f.|) = o 

/(1,1) = 81 

[0,1]2 

Three-Hump 
Camel Func¬ 
tion 

“f ^ 2(10x1 - 5)2 - 

1.05(10x1 - 5)^ + i(10xi - 
5)® + (10xi -5)(10 x2 -5) + 
(10X2 - 5)2 

/(0.5,0.5) ^ 0 

/(I, 1) = 2047.92 

[0,1]2 

Styblinski- 
Tang Func¬ 
tion 

/ = E?=i ^(10x1 - 5)'* - 
8(10x1 - 5)2 + |(10xi - 5) 

/(0.209, . . . ,0.209) - 

-39.16599n 

/(I, . . . , 1) = 125n 

[0,1]” 

Rosenbrock 

Function 

/ = Ellr/ 100(4.096x1+1 - 
2.048 - (4.096x1 - 

2.048)2)2 + (4.096x1 - 

3.048)2 

f( 3048 3048 \ _ ^ 

J ^ 4096 ’ ■ • • ’ 4096 / ~ 

/(o,...,o) 

3905.93(n - 1) 

[0,1]” 


Table 2. Relative gaps of for test functions in Table [TJ 


k 

Booth 

Matyas 

Motzkin 

T-H. Camel 

St.-Tang (n = 2) 

Rosen, (n = 2) 

Rosen, (n = 3) 

Rosen, (n = 4) 

1 

10.8199 

17.3333 

5.1852 

12.9776 

20.0499 

7.7615 

10.1745 

11.0081 

2 

9.6633 

12.0000 

2.7020 

4.2038 

18.5633 

6.0339 

7.7310 

9.3678 

3 

8.2498 

11.0667 

2.7020 

4.2038 

17.2942 

4.5549 

6.8671 

7.7383 

4 

7.0933 

8.8000 

1.5732 

1.9822 

15.8076 

3.8045 

6.1275 

7.1624 

5 

6.6307 

8.1333 

1.5732 

1.9822 

15.0461 

3.6406 

5.2637 

6.6694 

6 

5.8340 

6.9867 

1.2615 

1.1892 

14.2847 

3.3393 

4.4018 

6.0935 

7 

5.5476 

6.5524 

1.2615 

1.1892 

13.8738 

3.0766 

4.0267 

5.5188 

8 

5.0409 

5.9048 

1.1002 

0.8458 

13.4630 

2.6480 

3.7922 

4.9429 

9 

4.8354 

5.6190 

1.1002 

0.8458 

13.2211 

2.5610 

3.4171 

4.3682 

10 

4.5324 

5.2245 

1.0541 

0.6771 

12.9796 

2.3301 

3.2259 

4.1182 

11 

4.2234 

5.0317 

1.0541 

0.6771 

12.6013 

2.2383 

3.0602 

3.9269 

12 

4.0949 

4.7778 

1.0351 

0.5144 

12.1905 

1.9703 

2.8821 

3.6767 

13 

3.8340 

4.6444 

1.0351 

0.5144 

11.8216 

1.9210 

2.7146 

3.4725 

14 

3.6523 

4.4741 

1.0328 

0.4236 

11.5798 

1.7703 

2.6079 

3.2225 

15 

3.4952 

4.3798 

1.0295 

0.4236 

11.3687 

1.6965 

2.4226 

3.0950 

16 

3.3013 

4.2618 

1.0291 

0.3539 

10.9180 

1.5472 

2.2938 

2.9845 

17 

3.2032 

4.1939 

1.0175 

0.3539 

10.5491 

1.5167 

2.1725 

2.8543 

18 

3.0317 

4.1102 

1.0048 

0.3016 

10.1803 

1.4152 

2.0916 

2.7439 

19 

2.9246 

4.0606 

0.9953 

0.3016 

9.9692 

1.3556 

1.9926 

2.6449 

20 

2.8340 

4.0000 

0.9907 

0.2628 

9.7582 

1.2643 

1.9210 

2.5134 

25 

2.3768 

3.4324 

0.9583 

0.2064 

8.7403 

1.0421 

1.5524 

2.0716 

30 

2.0479 

2.8927 

0.9227 

0.1557 

7.7221 

0.8535 

1.3046 

1.7571 

35 

1.7964 

2.5989 

0.8725 

0.1336 

7.0469 

0.7353 

1.1128 

1.5175 

40 

1.6053 

2.2609 

0.8179 

0.1105 

6.3713 

0.6371 

0.9665 

1.3286 

45 

1.4456 

2.0800 

0.7721 

0.0993 

5.8880 

0.5628 

0.8591 

1.1861 

50 

1.3129 

1.8595 

0.7301 

0.0868 

5.4195 

0.5054 

0.7634 

1.0592 


One notices that the observed convergence rate is more-or-less in line with the 0{l/k) bound. 

In a next experiment, we compare the Handelman-type densities {RG{%) by bounds) to 
SOS densities (we still use the notation RG{%) = (/|°® ^min,K) )/(^max,K) ^min,/C ) X 100); we 
also compare their computation times (in seconds), for which we use the approaches described in 
Section and we assume that the values 7 (,,,/ 3 ) for all (??,/3) S ^l+d moments of the 

Lebesgue measure on /C = [0,1]" are computed beforehand; see Tables |3l |4]and[5l We performed 
the computation using Matlab on a Laptop with Intel Core i7-4600U CPU (2.10 GHz) and 8 GB 
RAM. The generalized eigenvalue computation was done in Matlab using the eig function. 
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Table 3. Comparison of two upper bounds for Booth, Matyas and Three-Hump 
Camel functions in relative gaps and computation times (sec.) 


k 

Booth 

Matyas 

Three—Hump Camel 

•^fc/2 

fk 

fk/2 

fk 

•^fc/2 

fk 


time 


time 


time 

ii(S{yo) 

time 

}-CCj[VQ} 

time 

}-CCj[VQ} 

time 

2 

9.433 

0.0007 

9.663 

0.0001 

8.267 

0.0009 

12.0 

0.0001 

12.98 

0.0008 

4.204 

0.0001 

4 

6.264 

0.0006 

7.093 

0.0003 

5.322 

0.0005 

8.8 

0.0003 

1.416 

0.0006 

1.982 

0.0002 

6 

4.564 

0.0008 

5.834 

0.0008 

4.282 

0.0009 

6.987 

0.0007 

1.416 

0.0011 

1.189 

0.0007 

8 

3.764 

0.0015 

5.041 

0.0025 

3.894 

0.0017 

5.905 

0.0018 

0.4678 

0.002 

0.8458 

0.0017 

10 

2.691 

0.0025 

4.532 

0.0038 

3.689 

0.0033 

5.224 

0.0039 

0.4678 

0.0035 

0.6771 

0.0037 

12 

2.45 

0.0047 

4.095 

0.0065 

2.996 

0.0056 

4.778 

0.0074 

0.2168 

0.0086 

0.5144 

0.0063 

14 

1.814 

0.0072 

3.652 

0.0109 

2.547 

0.0102 

4.474 

0.0112 

0.2168 

0.0128 

0.4236 

0.0117 

16 

1.607 

0.0097 

3.301 

0.0177 

2.043 

0.0131 

4.262 

0.0178 

0.1245 

0.0139 

0.3539 

0.0179 

18 

1.319 

0.0146 

3.032 

0.0276 

1.834 

0.0226 

4.11 

0.0266 

0.1245 

0.0377 

0.3016 

0.027 

20 

1.107 

0.0242 

2.834 

0.0391 

1.478 

0.0329 

4.0 

0.0384 

0.08363 

0.0312 

0.2628 

0.0397 


Table 4. Comparison of two upper bounds for Motzkin, Styblinski-Tang (n = 2) 
and Rosenbrock (n = 2) functions in relative gaps and computation times (sec.) 


k 

Motzkin 

Sty.—Tang (n = 2) 

Rosenb. (n = 2) 

fk°2 

fk 

•^fe/2 

fk 

•^fc/2 

fk 

) 

time 

} 

time 

H.(S{yQ) 

time 

}iij{yo) 

time 

H,cj{yo) 

time 

H,cj{yo) 

time 

2 

5.185 

0.0008 

2.702 

0.0001 

19.92 

0.0008 

18.56 

0.0001 

5.495 

0.001 

6.034 

0.0001 

4 

1.31 

0.0005 

1.573 

0.0003 

16.01 

0.0005 

15.81 

0.0002 

3.899 

0.0009 

3.804 

0.0003 

6 

1.31 

0.0009 

1.261 

0.0009 

13.38 

0.0009 

14.28 

0.0008 

2.685 

0.0018 

3.339 

0.0013 

8 

1.024 

0.0016 

1.1 

0.002 

11.23 

0.0016 

13.46 

0.0021 

1.936 

0.0031 

2.648 

0.0034 

10 

0.989 

0.0034 

1.054 

0.0043 

10.12 

0.0028 

12.98 

0.0037 

1.319 

0.0031 

2.33 

0.0057 

12 

0.989 

0.0062 

1.035 

0.006 

8.308 

0.0063 

12.19 

0.0078 

1.07 

0.0049 

1.97 

0.008 

14 

0.8752 

0.0096 

1.033 

0.0168 

6.678 

0.0097 

11.58 

0.0177 

0.7716 

0.0083 

1.77 

0.012 

16 

0.6982 

0.0216 

1.029 

0.0179 

6.009 

0.014 

10.92 

0.0214 

0.6614 

0.0119 

1.547 

0.0237 

18 

0.6982 

0.0242 

1.005 

0.0266 

5.342 

0.0231 

10.18 

0.0358 

0.4992 

0.0198 

1.415 

0.0264 

20 

0.6269 

0.0298 

0.9907 

0.046 

4.36 

0.0286 

9.758 

0.042 

0.4455 

0.0324 

1.264 

0.0383 


Table 5. Comparison of two upper bounds for Rosenbrock functions (n = 3,4) in 
relative gaps and computation times (sec.) 


k 

Rosenb. (n = 3) 

Rosenb. (n = 4) 

•^fc/2 

fk 


fk 

RGi%) 

time 

RG(%) 

time 

RG(%) 

time 

RG{%) 

time 

2 

8.053 

0.0033 

7.731 

0.0001 

8.945 

0.0204 

9.368 

0.0002 

4 

5.046 

0.0009 

6.128 

0.0007 

5.891 

0.0243 

7.162 

0.0017 

6 

3.787 

0.0024 

4.402 

0.0021 

4.577 

0.0111 

6.093 

0.0062 

8 

2.649 

0.0078 

3.792 

0.0054 

3.266 

0.0442 

4.943 

0.0228 

10 

2.152 

0.016 

3.226 

0.0135 

2.686 

0.2087 

4.118 

0.0699 

12 

1.556 

0.0355 

2.882 

0.0244 

2.02 

0.3774 

3.677 

0.1837 

14 

1.305 

0.0811 

2.608 

0.041 

1.73 

0.9121 

3.222 

0.431 

16 

0.9918 

0.1324 

2.294 

0.0684 

1.334 

1.986 

2.985 

1.099 

18 

0.8538 

0.2272 

2.092 

0.1139 

1.169 

4.279 

2.744 

1.92 


As described in Example 12.51 there is no ordering possible in general between and f^, 
but one observes that /^°2 — fk holds in most cases, i.e., the SOS densities usually give better 
bounds for a given degree. One should bear in mind though, that the f ^°2 ^''’6 in general much 
more expensive to compute than f^, as discussed in Section [231 This is not really visible in the 
computational times presented here, since the values of n in the examples are too small. 

Next we consider the strategies for generating feasible points corresponding to the bounds f^, 
as described in Section |6l see Table [H 
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Table 6. Comparing strategies for generating feasible points for Booth, Matyas, 
Motzkin, and Three-Hump Camel functions. Here, x denotes the mode of the 
optimal density. 



Booth 

Matyas 

Motzkin 

Three-H 

. Camel 


Jk 

/(*) 

/(E(X)) 

773 

Jk 

/(*) 

/(E(X)) 

773 

Jk 

/(*) 

Jk 

/(*) 

5 

172.0 

96.222 

17.0 

8.1333 

4.0 

1.460 

1.2743 

1.0 

40.593 

— 

10 

117.571 

96.222 

25.806 

5.2245 

4.0 

2.0408 

0.8538 

1.0 

13.867 

— 

15 

90.6667 

27.580 

7.6777 

4.3798 

4.0 

2.5017 

0.8339 

1.0 

8.6752 

0.273 

20 

73.5152 

9.0 

2.0 

4.0000 

0.16 

0.1111 

0.8025 

1.0 

5.3826 

0 

25 

61.6535 

4.5785 

1.8107 

3.4324 

0.3161 

0.2404 

0.7762 

1.0 

4.2267 

0.1653 

30 

53.1228 

1.6403 

0.41428 

2.8927 

0.0178 

0.0138 

0.7474 

1.0 

3.1892 

0 

35 

46.5982 

1.0923 

0.53061 

2.5989 

0.1071 

0.0897 

0.7067 

0.4214 

2.7367 

0.110 

40 

41.6416 

0.8454 

0.64566 

2.2609 

0 

0 

0.6625 

0.2955 

2.2626 

0 

45 

37.4988 

2.0 

0.80157 

2.0800 

0 

0 

0.6254 

0.1985 

2.0337 

0.0783 

50 

34.0573 

0.9784 

0.22222 

1.8595 

0 

0 

0.5914 

0.1297 

1.7768 

0 


In TablelH the columns marked /(E(X)) refer to the convex case in Theorem l6.2l The columns 
marked /(x) correspond to the mode x of the optimal density; an entry ‘—’ in these columns means 
that the mode of the optimal density was not unique. 

For the convex Booth and Matyas functions f(E{X)) gives the best upper bound. For sufficiently 
large k the mode x gives a better bound than fj^, indicating that this heuristic is useful in the 
non-convex case. 

As a final comparison, we also look at the general sampling technique via the method of con¬ 
ditional distributions; see Tables [7] and HI We present results for the Motzkin polynomial and the 
Three hump camel function. 

Table 7. Sampling results for Motzkin polynomial 



Sample size 10 

Sample size 100 

k 

Jk 

Mean 

Variance 

Minimum 

Mean 

Variance 

Minimum 

5 

1.2743 

0.8330 

0.0466 

0.2790 

1.1590 

4.2023 

0.0525 

10 

0.8538 

0.7005 

0.0800 

0.1862 

0.8435 

0.1448 

0.1149 

15 

0.8339 

0.9063 

0.0153 

0.6069 

0.8465 

0.0932 

0.0593 

20 

0.8025 

0.7704 

0.0336 

0.3826 

0.9326 

1.6454 

0.0040 

25 

0.7762 

0.7995 

0.1014 

0.2433 

0.7493 

0.0717 

0.0722 

30 

0.7474 

1.0104 

1.2852 

0.1091 

0.8290 

0.8620 

0.0522 

35 

0.7067 

0.5930 

0.0981 

0.1940 

0.7647 

1.3012 

0.0016 

40 

0.6625 

0.6967 

0.0497 

0.2867 

0.6028 

0.1371 

0.0021 

45 

0.6254 

0.6258 

0.0500 

0.3548 

0.7007 

0.2242 

0.0090 

50 

0.5914 

0.6244 

0.0718 

0.3000 

0.5782 

0.1406 

0.0154 

Uni 

orm Sample 

4.2888 

37.4427 

0.5290 

3.7397 

53.8833 

0.0492 


Table 8. Sampling results for Three-Hump Camel function 



Sample size 10 

Sample size 100 

k 

f“ 

Mean 

Variance 

Minimum 

Mean 

Variance 

Minimum 

5 

40.593 

91.872 

27065.0 

0.90053 

53.656 

14575.0 

0.58086 

10 

13.867 

11.312 

45.784 

0.8916 

14.273 

382.98 

0.018985 

15 

8.6752 

5.6281 

31.311 

0.21853 

10.373 

778.32 

0.022282 

20 

5.3826 

3.5174 

16.053 

0.43269 

9.4178 

653.27 

0.041752 

25 

4.2267 

10.741 

776.55 

0.59616 

5.0642 

112.61 

0.039463 

30 

3.1892 

2.2515 

8.6915 

0.063265 

2.2096 

6.2611 

0.040845 

35 

2.7367 

1.5032 

1.4626 

0.0085016 

3.0679 

16.47 

0.24175 

40 

2.2626 

1.3941 

1.1995 

0.21653 

2.3431 

17.735 

0.069473 

45 

2.0337 

2.3904 

10.934 

0.57818 

1.8928 

3.6581 

0.050042 

50 

1.7768 

1.664 

3.3983 

0.061995 

1.6301 

1.6966 

0.048476 

Uni 

orm Sample 

306.96 

275366.0 

0.15602 

368.28 

296055.0 

0.59281 
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For each degree k, we use the sample sizes 10 and 100. In Tables [7] and |S] we record the mean, 
variance and the minimum value of these samples. (Recall that the expected value of the sample 
mean equals f^.) We also generate samples uniformly from [0,1]", for comparison. 

The mean of the sample function values approximates reasonably well for sample size 100, 
but less so for sample size 10. Moreover, the mean sample function value for uniform sampling 
from [0, 1 ]” is much higher than f^. Also, the minimum function value for sampling is significantly 
lower than the minimum function value obtained by uniform sampling for most values of k. 


8. Concluding remarks 


One may consider several strategies to improve the upper bounds /^, and we list some in turn. 

• A natural idea is to use density functions that are convex combinations of SOS and 
Handelman-type densities, i.e., that belong to Hk + S[a;]r. for some nonnegative integers 
fc,r. Unfortunately one may show that this does not yield a better upper bound than 
/f}, namely 


, (/ /(x)cr(x)dx: f cr(x)dx = lj 

o-G'Hfc+E[a;]r (J/c J fc j 


k,r e N. 


(We omit the proof since it is straightforward, and of limited interest.) 

• For optimization over the hypercube, a second idea is to replace the integer exponents in 
Handelman representations of the density by more general positive real exponents. (This 
is amenable to analysis since the beta distribution is defined for arbitrary positive shape 
parameters and with its moments available via relation (14.21) .') If we drop the integrality 
requirement for (rj,/3) in the definition of (see (11.31) 1. we obtain the bound: 


/f > ■■= 


min 


E /- 


l{rj+a,P) 


fc G N, 


where is the simplex A^” := {(? 7 ,/I) G + A) = 

As with , when ( 17 ,/3) is such that ^ “ 

E(/(A)) where X = (Ai,..., A„) and Xi ^ beta{r]i + l, A + 1) (* S [n]). Using the moments 
of the beta distribution in (14.2L we obtain 


( 8 . 1 ) 


jbeta _ 


mm 

(r;./3)GA2 


aGW: 


All 


(??* + 1) • • • iVi + ai) 


(bi + A + 2) • • • (r]i + A + + 1)' 


fc G N. 


Thus one may obtain the bounds by minimizing a rational function over a simplex. A 
question for future research is whether one may approximate to any fixed accuracy in 
time polynomial in k and n. (This may be possible, since the minimization of fixed-degree 
polynomials over a simplex allows a PTAS [4] , and the relevant algorithmic techniques have 
been extended to rational objective functions m-) 

One may also use the value of {ri,f3) G A|” that gives fj^ as a starting point in the 
minimization problem dHU), and employ any iterative method to obtain a better upper 
bound heuristically. Subsequently, one may use the resulting density function to obtain 
‘good’ feasible points as described in Section |6l Of course, one may also use the feasible 
points (generated by sampling) as starting points for iterative methods. Suitable iterative 
methods for bound-constrained optimization are described in the books [ami, and the 
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latest algorithmic developments for bound constrained global optimization are surveyed in 
the recent thesis [22] , 

• Perhaps the most promising practical variant of the bound is the following parameter: 


cH 

'r.k 


min 


min 


/ /(x) (x''(l-x)/3)’-dx 
JK _ 

f (x’'(l-x)'5)’'dx 

Jk 


E/^ 




T(r77+o;,r/3) 

T(rr;,r/3) 


for r^k G N. 


Thus, the idea is to replace the density cr{x) = xP{l — x)^/ x’'(l — x)^ dx by the density 

(t{xY/ <j{rY dx for some power r S N. Hence, for r = 1, = fY■ Note that the calcu¬ 

lation of fYk requires exactly the same number of elementary operations as the calculation 
of fY, provided all the required moments are available. (Also note that, for K. = [0,1]", one 
could allow an arbitrary r > 0 since the moments are still available as pointed out above.) 
In Tables [9l [TOl and HH we show some relative gaps for the parameter defined as 

{fYk - /min,K)/(/max,K " fminx) X 100. 


Table 9. Relative gaps of fYk for the Styblinski-Tang function (n = 2) 


k 

r = 1 

r = 2 

r = 3 

r = 4 

r = 5 

1 

20.0499 

20.7931 

21.3190 

21.3190 

21.3190 

2 

18.5633 

18.4184 

18.7040 

19.0470 

19.3665 

3 

17.2942 

17.2522 

16.9793 

16.7974 

16.6631 

4 

15.8076 

15.5176 

15.2511 

14.6398 

14.1912 

5 

15.0461 

14.3517 

14.3645 

13.8452 

13.3692 

6 

14.2847 

13.1855 

12.6361 

12.2758 

12.0074 

7 

13.8738 

12.0519 

10.9113 

10.1182 

9.5355 

8 

13.4630 

10.9180 

9.1831 

7.9606 

7.0636 

9 

13.2211 

10.3381 

8.4528 

7.1660 

6.2416 

10 

12.9796 

9.7582 

7.7221 

6.3713 

5.4195 


Table 10. Relative gaps of fYk for the Rosenbrock function {n = 3) 


k 

r = 1 

r = 2 

r = 3 

r = 4 

r = 5 

1 

10.1745 

9.3107 

8.9356 

8.7536 

8.6603 

2 

7.7310 

6.5571 

6.0674 

5.8142 

5.6807 

3 

6.8671 

5.7557 

5.1021 

4.7091 

4.4890 

4 

6.1275 

4.7220 

3.7699 

3.2404 

2.9126 

5 

5.2637 

3.5090 

3.0196 

2.9302 

2.9826 

6 

4.4018 

2.8821 

2.4570 

1.9388 

1.5359 

7 

4.0267 

2.8901 

2.1273 

1.6465 

1.3623 

8 

3.7922 

2.5456 

1.8554 

1.4301 

1.1273 

9 

3.4171 

2.3701 

1.7074 

1.3206 

1.0798 

10 

3.2259 

2.0283 

1.4251 

1.1250 

0.8966 


A first important observation is that, for fixed k, the values of fYk are not monotonically 
decreasing in r; see e.g. the row A: = 2 in Table 13 Likewise, the sequence fYk is not 
monotonically decreasing in k for fixed r; see, e.g., the column r = 5 in Table fTOl 







































BOUND-CONSTRAINED POLYNOMIAL OPTIMIZATION USING ONLY ELEMENTARY CALCULATIONS 23 


Table 11 . Relative gaps of for the Rosenbrock function (n = 4) 


k 

r = 1 

r = 2 

r = 3 

r = 4 

r = 5 

1 

11.0081 

10.4440 

10.1939 

10.0727 

10.0104 

2 

9.3678 

8.5929 

8.2655 

8.0963 

8.0074 

3 

7.7383 

6.7421 

6.3371 

6.1202 

6.0046 

4 

7.1624 

6.2079 

5.7098 

5.4000 

5.2266 

5 

6.6694 

5.1729 

4.2870 

3.8120 

3.5307 

6 

6.0935 

4.4015 

3.3909 

2.8242 

2.4706 

7 

5.5188 

3.5929 

2.8908 

2.6175 

2.5173 

8 

4.9429 

3.1671 

2.5076 

1.9564 

1.5528 

9 

4.3682 

2.8285 

2.2958 

1.7616 

1.4370 

10 

4.1182 

2.7624 

2.1065 

1.6160 

1.2793 


On the other hand, it is clear from Tables Eiiia and HD that can provide a much 
better bound than for r > 1. 

Since is not monotonically decreasing in r (for fixed k), or in k (for fixed r), one 
has to consider the convergence question. An easy case is when 1C = [0,1]" and the global 
minimizer x* is rational. Say x* = ^ {i € [n]), setting g* = 1 and pi = x* when x* € {0,1}. 
Consider the following variation of the parameters ij*, /3* from Definition 14.51 ij* = rpi + 1 
and P* = r{qi - k) + 1 for i e [n], so that V* + /5* - 2 = r(X(r=i 9*)- Combining 

relation (14.Sp and Theorem 14.81 we can conclude that the following inequality holds: 

f^k - /(x*) < ^ for all k > X;r=i ft and r > 1, 

where C/ is a constant that depends on / only. 

For more general sets /C, one may ensure convergence by considering instead the following 
parameter (for fixed R G N): 


min^</f (fceN). 

rG[R] 

Then convergence follows from the convergence results for Moreover, this last param¬ 
eter may be computed in polynomial time if k is fixed, and R is bounded by a polynomial 
in n. 
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